Spatial weights are mathematical structures used to represent spatial relationships. They characterize the relationship of each observation to every other observation using some concept of proximity or closeness that depends on the weight type.
They can be build in PySAL from shapefiles, as well as some types of files.
In [1]:
import pysal as ps
import numpy as np
There are functions to construct weights directly from a file path.
In [2]:
shp_path = ps.examples.get_path('NAT.shp')
A commonly-used type of weight is a queen contigutiy weight, which reflects adjacency relationships as a binary indicator variable denoting whether or not a polygon shares an edge or a verted each another polygon. These weights are symmetric, in that when polygon $A$ neighbors polygon $B$, both $w_{AB} = 1$ and $w_{BA} = 1$.
To construct queen weights from a shapefile, use the queen_from_shapefile
function:
In [3]:
qW = ps.queen_from_shapefile(shp_path)
dataframe = ps.pdio.read_files(shp_path)
In [4]:
qW
Out[4]:
All weights objects have a few traits that you can use to work with the weights object, as well as to get information about the weights object.
To get the neighbors & weights around an observation, use the observation's index on the weights object, like a dictionary:
In [5]:
qW[4] #neighbors & weights of the 5th observation
Out[5]:
By default, the weights and the pandas dataframe will use the same index. So, we can view the observation and its neighbors in the dataframe by putting the observation's index and its neighbors' indexes together in one list:
In [6]:
self_and_neighbors = [4]
self_and_neighbors.extend(qW.neighbors[4])
print(self_and_neighbors)
and grabbing those elements from the dataframe:
In [7]:
dataframe.loc[self_and_neighbors]
Out[7]:
A full, dense matrix describing all of the pairwise relationships is constructed using the .full
method, or when pysal.full
is called on a weights object:
In [8]:
Wmatrix, ids = qW.full()
#Wmatrix, ids = ps.full(qW)
In [9]:
Wmatrix
Out[9]:
Note that this matrix is binary, in that its elements are either zero or one, since an observation is either a neighbor or it is not a neighbor.
However, many common use cases of spatial weights require that the matrix is row-standardized. This is done simply in PySAL using the .transform
attribute
In [10]:
qW.transform = 'r'
Now, if we build a new full matrix, its rows should sum to one:
In [11]:
Wmatrix, ids = qW.full()
In [12]:
Wmatrix.sum(axis=1) #numpy axes are 0:column, 1:row, 2:facet, into higher dimensions
Out[12]:
Since weight matrices are typically very sparse, there is also a sparse weights matrix constructor:
In [13]:
qW.sparse
Out[13]:
By default, PySAL assigns each observation an index according to the order in which the observation was read in. This means that, by default, all of the observations in the weights object are indexed by table order. If you have an alternative ID variable, you can pass that into the weights constructor.
For example, the NAT.shp
dataset has a possible alternative ID Variable, a FIPS
code.
In [14]:
dataframe.head()
Out[14]:
The observation we were discussing above is in the fifth row: Pend Oreille county, Washington. Note that its FIPS code is 53051.
Then, instead of indexing the weights and the dataframe just based on read-order, use the FIPS
code as an index:
In [15]:
qW = ps.queen_from_shapefile(shp_path, idVariable='FIPS')
Now, Pend Oreille county has a different index:
In [16]:
qW[4] #fails, since no FIPS is 4.
Note that a KeyError
in Python usually means that some index, here 4
, was not found in the collection being searched, the IDs in the queen weights object. This makes sense, since we explicitly passed an idVariable
argument, and nothing has a FIPS
code of 4.
Instead, if we use the observation's FIPS
code:
In [17]:
qW['53051']
Out[17]:
We get what we need.
In addition, we have to now query the dataframe using the FIPS
code to find our neighbors. But, this is relatively easy to do, since pandas will parse the query by looking into python objects, if told to.
First, let us store the neighbors of our target county:
In [18]:
self_and_neighbors = ['53051']
self_and_neighbors.extend(qW.neighbors['53051'])
Then, we can use this list in .query
:
In [19]:
dataframe.query('FIPS in @self_and_neighbors')
Out[19]:
Note that we have to use @
before the name in order to show that we're referring to a python object and not a column in the dataframe.
In [20]:
#dataframe.query('FIPS in neighs') will fail because there is no column called 'neighs'
Of course, we could also reindex the dataframe to use the same index as our weights:
In [21]:
fips_frame = dataframe.set_index(dataframe.FIPS)
fips_frame.head()
Out[21]:
Now that both are using the same weights, we can use the .loc
indexer again:
In [22]:
fips_frame.loc[self_and_neighbors]
Out[22]:
Rook weights are another type of contiguity weight, but consider observations as neighboring only when they share an edge. The rook neighbors of an observation may be different than its queen neighbors, depending on how the observation and its nearby polygons are configured.
We can construct this in the same way as the queen weights, using the special rook_from_shapefile
function
In [23]:
rW = ps.rook_from_shapefile(shp_path, idVariable='FIPS')
In [24]:
rW['53051']
Out[24]:
These weights function exactly like the Queen weights, and are only distinguished by what they consider "neighbors."
In theory, a "Bishop" weighting scheme is one that arises when only polygons that share vertexes are considered to be neighboring. But, since Queen contiguigy requires either an edge or a vertex and Rook contiguity requires only shared edges, the following relationship is true:
$$ \mathcal{Q} = \mathcal{R} \cup \mathcal{B} $$where $\mathcal{Q}$ is the set of neighbor pairs via queen contiguity, $\mathcal{R}$ is the set of neighbor pairs via Rook contiguity, and $\mathcal{B}$ via Bishop contiguity. Thus:
$$ \mathcal{Q} \setminus \mathcal{R} = \mathcal{B}$$Bishop weights entail all Queen neighbor pairs that are not also Rook neighbors.
PySAL does not have a dedicated bishop weights constructor, but you can construct very easily using the w_difference
function. This function is one of a family of tools to work with weights, all defined in ps.weights
, that conduct these types of set operations between weight objects.
In [25]:
bW = ps.w_difference(qW, rW, constrained=False, silent_island_warning=True) #silence because there will be a lot of warnings
In [26]:
bW.histogram
Out[26]:
Thus, the vast majority of counties have no bishop neighbors. But, a few do. A simple way to see these observations in the dataframe is to find all elements of the dataframe that are not "islands," the term for an observation with no neighbors:
In [27]:
islands = bW.islands
In [28]:
dataframe.query('FIPS not in @islands')
Out[28]:
There are many other kinds of weighting functions in PySAL. Another separate type use a continuous measure of distance to define neighborhoods. To use these measures, we first must extract the polygons' centroids.
For each polygon poly
in dataframe.geometry
, we want poly.centroid
. So, one way to do this is to make a list of all of the centroids:
In [47]:
centroids = [list(poly.centroid) for poly in dataframe.geometry]
In [48]:
centroids[0:5] #let's look at the first five
Out[48]:
If we were working with point data, this step would be unncessary.
If we wanted to consider only the k
-nearest neighbors to an observation's centroid, we could use the knnW
function in PySAL.
This specific type of distance weights requires that we first build a KDTree
, a special representation for spatial point data. Fortunately, this is built in to PySAL:
In [50]:
kdtree = ps.cg.KDTree(centroids)
Then, we can use this to build a spatial weights object where only the closest k
observations are considered "neighbors." In this example, let's do the closest 5:
In [51]:
nn5 = ps.knnW(kdtree, k=5)
In [52]:
nn5.histogram
Out[52]:
So, all observations have exactly 5 neighbors. Sometimes, these neighbors are actually different observations than the ones identified by contiguity neighbors.
For example, Pend Oreille gets a new neighbor, Kootenai county:
In [53]:
nn5[4]
Out[53]:
In [54]:
dataframe.loc[nn5.neighbors[4] + [4]]
Out[54]:
In [55]:
fips_frame.loc[qW.neighbors['53051'] + ['53051']]
Out[55]:
Kernel Weights are continuous distance-based weights that use kernel densities to provide an indication of neighborliness.
Typically, they estimate a bandwidth
, which is a parameter governing how far out observations should be considered neighboring. Then, using this bandwidth, they evaluate a continuous kernel function to provide a weight between 0 and 1.
Many different choices of kernel functions are supported, and bandwidth can be estimated at each point or over the entire map.
For example, if we wanted to use a single estimated bandwidth for the entire map and weight according to a gaussian kernel:
In [68]:
kernelW = ps.Kernel(centroids, fixed=True, function='gaussian')
#ps.Kernel(centroids, fixed=False, function='gaussian') #same kernel, but bandwidth changes at each observation
In [69]:
dataframe.loc[kernelW.neighbors[4] + [4]]
Out[69]:
As you can see, this provides our target observation, Pend Oreille, with many new neighbors. \
In [ ]: